Pesquisa | Portal Regional da BVS

1.

An approach for collaborative development of a federated biomedical knowledge graph-based question-answering system: Question-of-the-Month challenges.

Fecho, Karamarie; Bizon, Chris; Issabekova, Tursynay; Moxon, Sierra; Thessen, Anne E; Abdollahi, Shervin; Baranzini, Sergio E; Belhu, Basazin; Byrd, William E; Chung, Lawrence; Crouse, Andrew; Duby, Marc P; Ferguson, Stephen; Foksinska, Aleksandra; Forero, Laura; Friedman, Jennifer; Gardner, Vicki; Glusman, Gwênlyn; Hadlock, Jennifer; Hanspers, Kristina; Hinderer, Eugene; Hobbs, Charlotte; Hyde, Gregory; Huang, Sui; Koslicki, David; Mease, Philip; Muller, Sandrine; Mungall, Christopher J; Ramsey, Stephen A; Roach, Jared; Rubin, Irit; Schurman, Shepherd H; Shalev, Anath; Smith, Brett; Soman, Karthik; Stemann, Sarah; Su, Andrew I; Ta, Casey; Watkins, Paul B; Williams, Mark D; Wu, Chunlei; Xu, Colleen H.

J Clin Transl Sci ; 7(1): e214, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37900350

RESUMO

Knowledge graphs have become a common approach for knowledge representation. Yet, the application of graph methodology is elusive due to the sheer number and complexity of knowledge sources. In addition, semantic incompatibilities hinder efforts to harmonize and integrate across these diverse sources. As part of The Biomedical Translator Consortium, we have developed a knowledge graph-based question-answering system designed to augment human reasoning and accelerate translational scientific discovery: the Translator system. We have applied the Translator system to answer biomedical questions in the context of a broad array of diseases and syndromes, including Fanconi anemia, primary ciliary dyskinesia, multiple sclerosis, and others. A variety of collaborative approaches have been used to research and develop the Translator system. One recent approach involved the establishment of a monthly "Question-of-the-Month (QotM) Challenge" series. Herein, we describe the structure of the QotM Challenge; the six challenges that have been conducted to date on drug-induced liver injury, cannabidiol toxicity, coronavirus infection, diabetes, psoriatic arthritis, and ATP1A3-related phenotypes; the scientific insights that have been gleaned during the challenges; and the technical issues that were identified over the course of the challenges and that can now be addressed to foster further development of the prototype Translator system. We close with a discussion on Large Language Models such as ChatGPT and highlight differences between those models and the Translator system.

2.

Progress toward a universal biomedical data translator.

Fecho, Karamarie; Thessen, Anne E; Baranzini, Sergio E; Bizon, Chris; Hadlock, Jennifer J; Huang, Sui; Roper, Ryan T; Southall, Noel; Ta, Casey; Watkins, Paul B; Williams, Mark D; Xu, Hao; Byrd, William; Dancík, Vlado; Duby, Marc P; Dumontier, Michel; Glusman, Gustavo; Harris, Nomi L; Hinderer, Eugene W; Hyde, Greg; Johs, Adam; Su, Andrew I; Qin, Guangrong; Zhu, Qian.

Clin Transl Sci ; 2022 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-35611543

RESUMO

Clinical, biomedical, and translational science has reached an inflection point in the breadth and diversity of available data and the potential impact of such data to improve human health and well-being. However, the data are often siloed, disorganized, and not broadly accessible due to discipline-specific differences in terminology and representation. To address these challenges, the Biomedical Data Translator Consortium has developed and tested a pilot knowledge graph-based "Translator" system capable of integrating existing biomedical data sets and "translating" those data into insights intended to augment human reasoning and accelerate translational science. Having demonstrated feasibility of the Translator system, the Translator program has since moved into development, and the Translator Consortium has made significant progress in the research, design, and implementation of an operational system. Herein, we describe the current system's architecture, performance, and quality of results. We apply Translator to several real-world use cases developed in collaboration with subject-matter experts. Finally, we discuss the scientific and technical features of Translator and compare those features to other state-of-the-art, biomedical graph-based question-answering systems.

3.

MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations.

Liu, Sisheng; Liu, Jinpeng; Xie, Yanqi; Zhai, Tingting; Hinderer, Eugene W; Stromberg, Arnold J; Vanderford, Nathan L; Kolesar, Jill M; Moseley, Hunter N B; Chen, Li; Liu, Chunming; Wang, Chi.

Bioinformatics ; 37(9): 1189-1197, 2021 06 09.

Artigo em Inglês | MEDLINE | ID: mdl-33165532

RESUMO

MOTIVATION: Cancer somatic driver mutations associated with genes within a pathway often show a mutually exclusive pattern across a cohort of patients. This mutually exclusive mutational signal has been frequently used to distinguish driver from passenger mutations and to investigate relationships among driver mutations. Current methods for de novo discovery of mutually exclusive mutational patterns are limited because the heterogeneity in background mutation rate can confound mutational patterns, and the presence of highly mutated genes can lead to spurious patterns. In addition, most methods only focus on a limited number of pre-selected genes and are unable to perform genome-wide analysis due to computational inefficiency. RESULTS: We introduce a statistical framework, MEScan, for accurate and efficient mutual exclusivity analysis at the genomic scale. Our framework contains a fast and powerful statistical test for mutual exclusivity with adjustment of the background mutation rate and impact of highly mutated genes, and a multi-step procedure for genome-wide screening with the control of false discovery rate. We demonstrate that MEScan more accurately identifies mutually exclusive gene sets than existing methods and is at least two orders of magnitude faster than most methods. By applying MEScan to data from four different cancer types and pan-cancer, we have identified several biologically meaningful mutually exclusive gene sets. AVAILABILITY AND IMPLEMENTATION: MEScan is available as an R package at https://github.com/MarkeyBBSRF/MEScan. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional , Neoplasias , Algoritmos , Genômica , Humanos , Mutação , Neoplasias/genética

4.

GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts.

Hinderer, Eugene W; Moseley, Hunter N B.

PLoS One ; 15(6): e0233311, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32525872

RESUMO

Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we present GOcats, a novel tool that organizes the Gene Ontology (GO) into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. We tested GOcats performance using subcellular location categories to mine annotations from GO-utilizing knowledgebases and evaluated their accuracy against immunohistochemistry datasets in the Human Protein Atlas (HPA). In comparison to term categorizations generated from UniProt's controlled vocabulary and from GO slims via OWLTools' Map2Slim, GOcats outperformed these methods in its ability to mimic human-categorized GO term sets. Unlike the other methods, GOcats relies only on an input of basic keywords from the user (e.g. biologist), not a manually compiled or static set of top-level GO terms. Additionally, by identifying and properly defining relations with respect to semantic scope, GOcats can utilize the traditionally problematic relation, has_part, without encountering erroneous term mapping. We applied GOcats in the comparison of HPA-sourced knowledgebase annotations to experimentally-derived annotations provided by HPA directly. During the comparison, GOcats improved correspondence between the annotation sources by adjusting semantic granularity. GOcats enables the creation of custom, GO slim-like filters to map fine-grained gene annotations from gene annotation files to general subcellular compartments without needing to hand-select a set of GO terms for categorization. Moreover, GOcats can customize the level of semantic specificity for annotation categories. Furthermore, GOcats enables a safe and more comprehensive semantic scoping utilization of go-core, allowing for a more complete utilization of information available in GO. Together, these improvements can impact a variety of GO knowledgebase data mining use-cases as well as knowledgebase curation and quality control.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Anotação de Sequência Molecular/métodos , Algoritmos , Bases de Dados Genéticas , Ontologia Genética/estatística & dados numéricos , Humanos , Bases de Conhecimento , Software

5.

SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology.

Abeysinghe, Rashmie; Hinderer, Eugene W; Moseley, Hunter N B; Cui, Licong.

Bioinformatics ; 36(10): 3207-3214, 2020 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-32065617

RESUMO

MOTIVATION: The Gene Ontology (GO) is the unifying biological vocabulary for codifying, managing and sharing biological knowledge. Quality issues in GO, if not addressed, can cause misleading results or missed biological discoveries. Manual identification of potential quality issues in GO is a challenging and arduous task, given its growing size. We introduce an automated auditing approach for suggesting potentially missing is-a relations, which may further reveal erroneous is-a relations. RESULTS: We developed a Subsumption-based Sub-term Inference Framework (SSIF) by leveraging a novel term-algebra on top of a sequence-based representation of GO concepts along with three conditional rules (monotonicity, intersection and sub-concept rules). Applying SSIF to the October 3, 2018 release of GO suggested 1938 unique potentially missing is-a relations. Domain experts evaluated a random sample of 210 potentially missing is-a relations. The results showed SSIF achieved a precision of 60.61, 60.49 and 46.03% for the monotonicity, intersection and sub-concept rules, respectively. AVAILABILITY AND IMPLEMENTATION: SSIF is implemented in Java. The source code is available at https://github.com/rashmie/SSIF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Ontologia Genética

6.

Advances in gene ontology utilization improve statistical power of annotation enrichment.

Hinderer, Eugene W; Flight, Robert M; Dubey, Rashmi; MacLeod, James N; Moseley, Hunter N B.

PLoS One ; 14(8): e0220728, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31415589

RESUMO

Gene-annotation enrichment is a common method for utilizing ontology-based annotations in gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission or modification lest erroneous term mappings occur. To address these issues, we created the Gene Ontology Categorization Suite, or GOcats-a novel tool that organizes the Gene Ontology into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the improvements in annotation enrichment by re-interpreting edges that would otherwise be omitted by traditional ancestor path-tracing methods. Specifically, we show that GOcats' unique handling of relations improves enrichment over conventional methods in the analysis of two different gene-expression datasets: a breast cancer microarray dataset and several horse cartilage development RNAseq datasets. With the breast cancer microarray dataset, we observed significant improvement (one-sided binomial test p-value = 1.86E-25) in 182 of 217 significantly enriched GO terms identified from the conventional path traversal method when GOcats' path traversal was used. We also found new significantly enriched terms using GOcats, whose biological relevancy has been experimentally demonstrated elsewhere. Likewise, on the horse RNAseq datasets, we observed a significant improvement in GO term enrichment when using GOcat's path traversal: one-sided binomial test p-values range from 1.32E-03 to 2.58E-44.

Assuntos

Ontologia Genética , Anotação de Sequência Molecular , Animais , Neoplasias da Mama/genética , Biologia Computacional , Bases de Dados Genéticas , Feminino , Cavalos/genética , Humanos , Bases de Conhecimento

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA